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1. Introduction 

In this paper we report on the progress that we have made in the search for an optimal Krylov 
subspace method for overlap fermions. In the previous lattice conference we reported preliminary 
results on a new method, the shifted unitary orthogonal method or SUOM The method is in 
fact the three-term-recurrence specialisation of the full orthogonalisation method (FOM) in case 
of shifted unitary matrices. Likewise, the shifted unitary minimal residual method (SUMR), dis- 
covered earlier by [Q, [3|] is a short-recurrence specialisation of the generalised minimal residual 
method (GMRES). 

Our task is to solve the linear system: 

Dx = b, x,beC N , DeC NxN (1.1) 

where D is the Neuberger's Overlap operator [|J] : 

D = Cl I + c 2 V (1.2) 

Here c\ = (1 +m)/2,C2 = (1 — m)/2 and m are the bare fermion mass, V is a unitary matrix given 
by: 

V=D w (D* w D w )-i (1.3) 
where Dw is Wilson-Dirac lattice operator. 



2. Arnoldi iteration for unitary matrices 

Some time ago Rutishauser ^ observed that for upper Hessenberg unitary matrices one can 
write H = LU~ l , where L and U are lower and upper bidiagonal matrices. Applying this decom- 
position for the Arnoldi iteration, 

VQk = QkH k + h k+ \,kqk+\el (2. 1) 

one obtains: 

VQkU k = QkLk + h k +]_,kqk+\el ■ (2.2) 
Since U k ,Lk are bidiagonal matrices we arrive to the following three-term recursion: 

h+ijclk+i =Vqk — qkhk+Vqic-iUk-i^ (2.3) 

This way we obtain the unitary Arnoldi process shown in Algorithm [IJ If we denote H k = H k + 
h+\.k^k^l, it is easy to show that for the unitary Arnoldi process we get: 

H£H k = H* k H k + l 2 k+hk e k e{ = I k (2.4) 
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Algorithm 1 Unitary Arnoldi Process 

qx=b/\\b\\ 2 

for k = 1 , . . . n do 
Wk = Vq k 
if k = 1 then 

Uk-\,k = 
else 

u k -\.k = -(9t_ 1 w*)/(?J_ 1 wj fc _i) 
end if 

W&+1 =w k -q k hk + Wk-\ u k -i,k 

4+1,/fc = ||W/t+l||2 

if 4+i,/t = then 

stop 
end if 

<lk+l = Wfc+l/4+l,£ 

end for 



3. The SUOM algorithm 



Using the unitary Arnoldi process one can ask an approximate solution of 1.1 as a linear 



combination of the Arnoldi vectors Q k . This leads to solving the smaller linear system: 

(c l I k + c 1 L k U k l )y k = e l (3.1) 
The matrix in the left hand side is upper Hessenberg. In order to solve a simpler system we define 



z k = U k l y k and get: 



(ciU k + c 2 L k )z k = e\ (3.2) 



where now T k = ciU k -\-c%L k is a tridiagonal matrix. Note that the solution to the above system 
can be updated recursively and this way one can update the solution of the original system by short 
recurrences. Here we omit the details and refer the interested reader to a forthcoming publication 
[g]. The resulting algorithm is shown in Algorithm ||. Note that Dw k multiplication is redundant. 
Indeed multiplying by D both sides of the equation: 

w k = qk + u k -i k q k -i -ciUk-ik/ik-ik-iWk-i (3-3) 

and saving the Vq k and Vq k -i vectors one obtains for free the Dw k vector. 

The SUOM algorithm is algebraically optimal. This means that the algorithm constructs a 
new residual vector which is orthogonal to the Krylov subspace already in place. However, a 
more satisfying optimality is the geometric optimality, which is a feature of algorithms with a new 
residual being smaller than the previous one. In this case one seeks the solution such that the 
residual vector norm is minimal: 

x = argmin \\b — Dx\\2 (3.4) 
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Algorithm 2 SUOM algorithm 

p = ||&|| 2 ; q\ =b/p; wi =qi 

ln=q?V qi 

q = Vq\ -h\q\ 

hi = \ \qW2; qi = q/h\ 
h\ =c\ +c 2 hi 

a,\ =p//n; jci = ctiwi; r\=b — ot\Dw\ 

for k = 2,3, . . . do 

Kjfc-tt = -qf-iVqic/qt^Vqk-i 
hk = q"Vq k + u k ^i k q%Vq k -\ 
q=(V- l kk )q k + u k ^i k Vq k -\ 
h+lk = \ \qW2 



x k =x k -i + a k w k 
n = r k -i -a k Dw k 
if H^lb < tol p then 

stop 
end if 
end for 



Arnoldi process offers an orthogonal basis vectors which can be used to project the above large 
least squares problem into a smaller problem: 



The algorithm that is derived this way is a minimal residual algorithm for shifted unitary matrices. 
We call it SUOM+ since it is a different algorithm from SUMR. The latter uses Givens rotations 



4. Comparison of algorithms 

In Figure 1 we show the convergence of various algorithms as a function of Wilson matrix- 
vector multiplication number on 8 3 16 lattices at various couplings and quark masses. For the 
overlap matrix-vector multiplication we use the double pass Lanczos algorithm (without small 
eigenspace projection of Hy?) 

We show the convergence os SUOM, SUMR, Conjugate Residuals (CR), Conjugate Gradients 
on Normal Equations (CGNE) and CG-CHI. The latter is the CGNE which solves simultaneously 
the decoupled chiral systems appearing in the matrix D*D. We have preliminary results for the 
SUOM+ algorithm in one case only. 



Qk+i = q/h+ik 

hk = C\ +C2hk — C\C2hk-\Uk-\k/h-lk-\ 

GCk = —C2hk-i/hkOC k -[ 

w k = q k + u k _\ k q k _\ — ciMfc_ifc//fc-ifc_iW ( fc_i 




(3.5) 
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Figure 1: Convergence history of various solvers for quark masses m = 0.05 (upper panel) and 
m = 0.01 (lower panel) on background gauge fields at j8 = 6 (left panel) and /3 = 5.7 (right panel). 

The first observation is that for quark propagator computations SUOM, SUMR and CR are 
more efficient than CGNE and CG-CHI algorithms. This is observed by the other groups as well 
[Q]. This is why we did not run further these algorithms for smaller quark masses. 
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Another interesting observation is that CR converges neck-to-neck with SUOM and SUMR 
algorithms for moderate quark masses. As this is lowered, we see that SUOM and SUMR con- 
vergence rate becomes larger than that of CR at a certain accuracy threshold, which depends on 

P- 

Hence, the best algorithms are the optimal algorithms SUOM and SUMR. These converge in 
all cases similarly with SUMR being slightly faster, something which should be expected from the 
geometric optimality of SUMR. 

However, we see, in the only case available, that SUOM+ converges faster than SUOM and 
SUMR. We expected a different behaviour of SUOM+ from SUMR, but there are no theoretical 
grounds to expect that SUOM+ is faster than SUMR. Since the result is preliminary, further tests 
are needed to make a definite conclusion. 

5. Conclusion 

We have shown how to build a class of optimal iterative solvers for overlap fermions using 
the unitary Arnoldi process [l[ This is easier to implement using the Rutishauser decomposition [||] 
than Givens rotations [||]. 

Preliminary results show that SUOM+ algorithm may converge faster then SUMR. It is ex- 
pected that different implementations can give different results and in our case this is obvious. 
What is not obviuos is which implementation is faster and this has to be further investigated 
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